Approaching a Smart Sharing of Resources in SMT Processors

نویسندگان

  • Francisco J. Cazorla
  • Enrique Fernandez
  • Alex Ramirez
  • Mateo Valero
چکیده

SMT processors increase performance by executing instructions from several threads simultaneously. These threads use the processor’s resources better by sharing them, but, at the same time, threads are competing for these resources. The way critical resources are distributed among threads determines the final throughput and also the performance of each individual thread. Currently, the processor instruction fetch policy decides each cycle which threads enter the processor to compete for resources. However, these fetch policies only use indirect indicators of how resource allocation is carried out. This may cause resource monopolization by a single thread, or wasted resources when no thread can use them. Both situations can harm the processor performance and occur, for example, after an L2 cache miss. This paper is a first step toward dynamic resource allocation for SMT processors. We show that being conscious about resource demand and directly controlling resource assignment significantly improves performance of SMTs. We introduce for the first time the concept of resource allocation policy in order to provide such a control. Our results show that our resource allocation policy outperforms the best published fetch policies for throughput and fairness, like FLUSH, by 7% on average. In addition, our resource allocation policy does not need to squash instructions from the pipeline, like FLUSH, in order to get this performance improvement. As a result, it reduces dynamic power consumption and hardware complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors

An important design issue of SMT processors is to find proper sharing strategies of resources among threads. This paper proposes a ROB sharing strategy, called paired ROB, that considers the fact that task parallelism is not always available to fully utilize resources of multithreaded processors. To this aim, an evaluation methodology is proposed and used for the experiments, which analyzes per...

متن کامل

Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors

Simultaneous Multithreading (SMT) has emerged as an effective method of increasing utilization of resources in modern super-scalar processors. SMT processors increase instruction-level parallelism (ILP) and resource utilization by simultaneously executing instructions from multiple independent threads. Although simultaneously sharing resources benefits system throughput, coscheduled threads oft...

متن کامل

Improving Memory Latency Aware Fetch Policies for SMT Processors

In SMT processors several threads run simultaneously to increase available ILP, sharing but competing for resources. The instruction fetch policy plays a key role, determining how shared resources are allocated. When a thread experiences an L2 miss, critical resources can be monopolized for a long time choking the execution of the remaining threads. A primary task of the instruction fetch polic...

متن کامل

CASH: Revisiting Hardware Sharing in Single-Chip Parallel Processors

As the increasing of issue width has diminishing returns with superscalar processor, thread parallelism with a single chip is becoming a reality. In the past few years, both SMT (Simultaneous MultiThreading) and CMP (Chip MultiProcessor) approaches were first investigated by academics and are now implemented by the industry. In some sense, CMP and SMT represent two extreme design points. In thi...

متن کامل

Evaluating Branch Predictors on an SMT Processor

Simultaneous multithreading (SMT) provides significant increases in microprocessor throughput by issuing instructions from multiple threads per clock cycle. SMT can be realized in a wide-issue superscalar with a modest increase in resources, because much of the hardware is shared among the multiple thread contexts. Branch prediction accuracy, a key component of microprocessor performance, can s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004